skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Freire, Juliana"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available July 21, 2026
  2. Free, publicly-accessible full text available June 18, 2026
  3. Free, publicly-accessible full text available April 24, 2026
  4. Direct exploitation, which includes the trade of wild animals for their parts, is a major driver of extinction. Digital communication tools, particularly the internet, have facilitated the trade in endangered species. Here, we automatically collected data to analyze online sales of threatened animals across 148 English-text online mar- ketplaces. We created a tool that searched for online sales of 13,267 animal species at risk of global extinction, as classified by the International Union for Conservation of Nature (IUCN), as well as 706 animal species on Ap- pendix I of the Convention for International Trade in Endangered Species (CITES), for which international commercial trade is prohibited. Examining a period of 15 weeks in 2018, we identified 10,699 unique listings selling body parts or eggs of threatened species, of which 4131 contained a full species name (common or sci- entific). These 4131 results were then filtered by keywords and, finally, manually vetted, which yielded 546 sale listings for 83 species. Of these 546 listings, 61 % advertised shark trophies (mainly jaws), 73 % of which were taken from species listed as endangered or critically endangered. Just four websites hosted >95 % of listings. We identified 18 species for sale that are included on CITES Appendix I. We also identified 13 species for which the IUCN had not identified intentional use as a threat. This work expands current understanding about the dealing of endangered and potentially illegal species online, specifies taxa threatened by online trade, and highlights emerging opportunities and persistent challenges to preventing the trafficking of threatened species. 
    more » « less
    Free, publicly-accessible full text available April 1, 2026
  5. Existing deep-learning approaches to semantic column type annotation (CTA) have important shortcomings: they rely on semantic types which are fixed at training time; require a large number of training samples per type; incur high run-time inference costs; and their performance can degrade when evaluated on novel datasets, even when types remain constant. Large language models have exhibited strong zero-shot classification performance on a wide range of tasks and in this paper we explore their use for CTA. We introduce ArcheType, a simple, practical method for context sampling, prompt serialization, model querying, and label remapping, which enables large language models to solve CTA problems in a fully zero-shot manner. We ablate each component of our method separately, and establish that improvements to context sampling and label remapping provide the most consistent gains. ArcheType establishes a new state-of-the-art performance on zero-shot CTA benchmarks (including three new domain-specific benchmarks which we release along with this paper), and when used in conjunction with classical CTA techniques, it outperforms a SOTA DoDuo model on the fine-tuned SOTAB benchmark. 
    more » « less
  6. Recently, Bessa et al. (PODS 2023) showed that sketches based on coordinated weighted sampling theoretically and empirically outperform popular linear sketching methods like Johnson-Lindentrauss projection and CountSketch for the ubiquitous problem of inner product estimation. We further develop this finding by introducing and analyzing two alternative sampling-based methods. In contrast to the computationally expensive algorithm in Bessa et al., our methods run in linear time (to compute the sketch) and perform better in practice, significantly beating linear sketching on a variety of tasks. For example, they provide state-of-the-art results for estimating the correlation between columns in unjoined tables, a problem that we show how to reduce to inner product estimation in a black-box way. While based on known sampling techniques (threshold and priority sampling) we introduce significant new theoretical analysis to prove approximation guarantees for our methods. 
    more » « less
  7. We prove a tight upper bound on the variance of the priority sampling method (aka sequential Poisson sampling). Our proof is significantly shorter and simpler than the original proof given by Mario Szegedy at STOC 2006, which resolved a conjecture by Duffield, Lund, and Thorup. 
    more » « less
  8. We present a new approach for independently computing compact sketches that can be used to approximate the inner product between pairs of high-dimensional vectors. Based on the Weighted MinHash algorithm, our approach admits strong accuracy guarantees that improve on the guarantees of popular linear sketching approaches for inner product estimation, such as CountSketch and Johnson-Lindenstrauss projection. Specifically, while our method exactly matches linear sketching for dense vectors, it yields significantly lower error for sparse vectors with limited overlap between non-zero entries. Such vectors arise in many applications involving sparse data, as well as in increasingly popular dataset search applications, where inner products are used to estimate data covariance, conditional means, and other quantities involving columns in unjoined tables. We complement our theoretical results by showing that our approach empirically outperforms existing linear sketches and unweighted hashing-based sketches for sparse vectors. 
    more » « less
  9. The Diversity, Equity and Inclusion (DEI) initiative started as the Diversity/Inclusion initiative in 2020 [4]. The current report summarizes our activities in 2023. 
    more » « less